[misc] feat: Add bidirectional MLM <-> MBridge config translation script#2630
[misc] feat: Add bidirectional MLM <-> MBridge config translation script#2630
Conversation
…LI string override Enable `model.activation_func=silu` (or `gelu`, `relu`, etc.) as a Hydra CLI override by serializing known activation functions as strings during OmegaConf conversion and resolving them back to callables in TransformerConfig.finalize(). Changes: - Add ACTIVATION_FUNC_MAP and str_to_callable/callable_to_str helpers in omegaconf_utils.py - Serialize activation_func as a string instead of excluding it from OmegaConf, so Hydra overrides work - Resolve string activation_func in TransformerConfig.finalize(), MLATransformerConfig.finalize(), and HeterogeneousTransformerConfig.finalize() before MCore post-init - Add vanilla_gpt_pretrain_config recipe for MLM<->Bridge correlation testing with minimal defaults Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
/ok to test e152402 |
📝 WalkthroughWalkthroughThese changes introduce infrastructure for handling string-based activation functions in configuration systems. A new vanilla GPT pretraining recipe is added, along with utilities to convert between string representations and callable activation functions for serialization and deserialization across config variants. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/YAML Config
participant OmegaConf
participant StrToCallable as str_to_callable()
participant TransformerConfig
participant Resolve as _resolve_string_fields()
participant Runtime
User->>OmegaConf: Provide activation_func: "gelu"
OmegaConf->>StrToCallable: Convert string to callable
StrToCallable->>StrToCallable: Look up in ACTIVATION_FUNC_MAP
StrToCallable-->>OmegaConf: Return F.gelu function
OmegaConf->>TransformerConfig: Pass config with callable
TransformerConfig->>Resolve: finalize() calls _resolve_string_fields()
Resolve->>Resolve: Resolve any remaining string activation_funcs
Resolve-->>TransformerConfig: String converted to callable
TransformerConfig-->>Runtime: Config ready with resolved activation_func
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/megatron/bridge/recipes/gpt/vanilla_gpt.py (1)
15-29: Recipe name is too generic for repository naming conventions.Please consider renaming
vanilla_gpt.py/vanilla_gpt_pretrain_configto include explicit size/config metadata for easier discovery (e.g.,gpt_<size>_vanilla_pretrain).As per coding guidelines, "Use descriptive recipe names that include the model size and configuration (e.g., llama3_8b.py, qwen2_7b_instruct.py)."
Also applies to: 51-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/recipes/gpt/vanilla_gpt.py` around lines 15 - 29, The recipe filename and identifier are too generic: rename the file and the recipe identifier (currently vanilla_gpt.py and vanilla_gpt_pretrain_config) to a descriptive name that includes model size/config (e.g., gpt_small_vanilla_pretrain or gpt_256x2_vanilla_pretrain), then update all internal references—docstring example, the recipe export/variable name (vanilla_gpt_pretrain_config), any entries used by run_recipe invocation, and any import/registration points that reference vanilla_gpt or vanilla_gpt_pretrain_config—so the repository consistently uses the new descriptive name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/megatron/bridge/training/utils/omegaconf_utils.py`:
- Around line 86-95: str_to_callable currently returns any attribute for dotted
paths even if it's not callable; update the dotted-path branch in
str_to_callable to verify the resolved attribute is callable before returning it
(use importlib.import_module and getattr as already done for module/attr
resolution), and if the attribute is found but not callable raise the same
ValueError used for unknown activation names (including the sorted
ACTIVATION_FUNC_MAP list for the message); ensure the function's return remains
a Callable and that non-callable attributes are rejected consistently with the
existing error path.
---
Nitpick comments:
In `@src/megatron/bridge/recipes/gpt/vanilla_gpt.py`:
- Around line 15-29: The recipe filename and identifier are too generic: rename
the file and the recipe identifier (currently vanilla_gpt.py and
vanilla_gpt_pretrain_config) to a descriptive name that includes model
size/config (e.g., gpt_small_vanilla_pretrain or gpt_256x2_vanilla_pretrain),
then update all internal references—docstring example, the recipe
export/variable name (vanilla_gpt_pretrain_config), any entries used by
run_recipe invocation, and any import/registration points that reference
vanilla_gpt or vanilla_gpt_pretrain_config—so the repository consistently uses
the new descriptive name.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/megatron/bridge/models/transformer_config.pysrc/megatron/bridge/recipes/gpt/__init__.pysrc/megatron/bridge/recipes/gpt/vanilla_gpt.pysrc/megatron/bridge/training/utils/omegaconf_utils.py
Extract duplicated activation-function name<->callable mappings from omegaconf_utils.py and ModelBridge.ACTIVATION_MAPPING into a single shared module (megatron.bridge.utils.activation_map). Both consumers now reference the same registry, keeping CLI overrides, config finalization, and HF<->Megatron conversion in sync. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
/ok to test 4c4b987 |
|
/ok to test bec6e21 |
- Add translate_mlm_to_bridge.py for MLM→Bridge and Bridge→MLM translation - Support --reverse flag for Bridge→MLM direction - Build reverse mapping from ARG_MAP + extra tables - Add MLM argparse introspection (optional, try/except guarded) - Handle swiglu, squared_relu, mixed_precision special cases - Output as CLI overrides, standalone recipe, or full torchrun command Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
bec6e21 to
38a82f9
Compare
|
/ok to test 38a82f9 |
Replace the unparseable dataset.blend Hydra CLI override with a simple dataset.data_path string field. GPTDatasetConfig.finalize() converts it to the blend tuple via MCore's get_blend_from_list, fixing the "ValueError: too many values to unpack" when specifying data paths from the command line. The translate script is updated in both directions. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
/ok to test 18499ea |
…mbeddings, and MoE args - Map --max-position-embeddings to model.max_position_embeddings (was incorrectly model.seq_length) - Map --seq-length to both model.seq_length and dataset.sequence_length (was only dataset.sequence_length) - Map --distributed-timeout-minutes to dist.distributed_timeout_minutes (was skipped) - Add --moe-router-pre-softmax mapping to model.moe_router_pre_softmax - Auto-disable sequence_parallel when tensor_model_parallel_size <= 1 in TransformerConfig Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor
|
/ok to test 7c3b6fa |
Summary
scripts/translate_mlm_to_bridge.py— a bidirectional translation script that converts between Megatron-LM (pretrain_gpt.py) CLI arguments and Megatron Bridge (run_recipe.py) Hydra-style overrides.--reverse) directions.torchruncommand in both directions.Motivation
Running correlation tests between Megatron-LM and Megatron Bridge requires matching configurations precisely across two very different config systems (argparse flags vs Hydra overrides on dataclass recipes). This script automates the translation, eliminating manual errors and making it easy to verify that both frameworks receive identical model/training parameters.
MLM → Bridge Translation
Translates Megatron-LM
pretrain_gpt.pyarguments (from YAML or CLI) into Bridge recipe overrides:Bridge → MLM Translation
Translates Bridge Hydra overrides into Megatron-LM CLI arguments:
Key mappings handled
num_layers,hidden_size,num_attention_heads,ffn_hidden_size,normalization,position_embedding_type,rotary_baseswiglu↔activation_func=silu+gated_linear_unit=true,squared-relu↔activation_func=squared_relumicro_batch_size,global_batch_size,train_iters,lr,weight_decay,clip_grad,seedtensor_model_parallel_size,pipeline_model_parallel_size,context_parallel_size,sequence_parallel,use_distributed_optimizerbf16↔bf16_mixed,fp16↔fp16_mixedtokenizer_type,tokenizer_model,vocab_sizeTest plan
/ok to test